智能论文笔记

Specification sketching for Linear Temporal Logic

Simon Lutz , Daniel Neider , Rajarshi Roy

分类：人工智能

2022-06-14

实际上，所有验证和综合技术都假定正式规格很容易获得，在功能上正确并完全匹配工程师对给定系统的理解。但是，在实践中，这种假设通常是不现实的：正式化系统要求非常困难，容易出错，并且需要大量的培训。为了减轻这一严重的障碍，我们提出了一种从根本上新颖的编写形式规范的方法，称为线性时间逻辑（LTL）的规范草图。关键的想法是，工程师可以提供部分LTL公式，称为LTL草图，在该公式中很难形式化。给定一组描述规范应该或不应允许的系统行为的示例，然后将所谓的草图算法的任务完成给定的草图，以使所得的LTL公式与示例一致。我们表明，决定是否可以完成草图属于复杂性NP，并呈现两个基于SAT的草图算法。我们还证明，素描是使用原型实现编写形式规格的实用方法。

translated by 谷歌翻译

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

Huaizheng Zhang , Yuanming Li , Wencong Xiao , Yizheng Huang , Xing Di , Jianxiong Yin , Simon See , Yong Luo , Chiew Tong Lau , Yang You

分类：机器学习

2023-01-01

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.

translated by 谷歌翻译

Broad Learning System with Takagi-Sugeno Fuzzy Subsystem for Tobacco Origin Identification based on Near Infrared Spectroscopy

Di Wang , Simon X. Yang

分类：机器学习 | 人工智能

2022-12-31

Tobacco origin identification is significantly important in tobacco industry. Modeling analysis for sensor data with near infrared spectroscopy has become a popular method for rapid detection of internal features. However, for sensor data analysis using traditional artificial neural network or deep network models, the training process is extremely time-consuming. In this paper, a novel broad learning system with Takagi-Sugeno (TS) fuzzy subsystem is proposed for rapid identification of tobacco origin. Incremental learning is employed in the proposed method, which obtains the weight matrix of the network after a very small amount of computation, resulting in much shorter training time for the model, with only about 3 seconds for the extra step training. The experimental results show that the TS fuzzy subsystem can extract features from the near infrared data and effectively improve the recognition performance. The proposed method can achieve the highest prediction accuracy (95.59 %) in comparison to the traditional classification algorithms, artificial neural network, and deep convolutional neural network, and has a great advantage in the training time with only about 128 seconds.

translated by 谷歌翻译

Properties of Group Fairness Metrics for Rankings

Tobias Schumacher , Marlene Lutz , Sandipan Sikdar , Markus Strohmaier

分类：机器学习

2022-12-29

In recent years, several metrics have been developed for evaluating group fairness of rankings. Given that these metrics were developed with different application contexts and ranking algorithms in mind, it is not straightforward which metric to choose for a given scenario. In this paper, we perform a comprehensive comparative analysis of existing group fairness metrics developed in the context of fair ranking. By virtue of their diverse application contexts, we argue that such a comparative analysis is not straightforward. Hence, we take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics that consider different ranking settings. A metric can then be selected depending on whether it satisfies all or a subset of these properties. We apply these properties on eleven existing group fairness metrics, and through both empirical and theoretical results we demonstrate that most of these metrics only satisfy a small subset of the proposed properties. These findings highlight limitations of existing metrics, and provide insights into how to evaluate and interpret different fairness metrics in practical deployment. The proposed properties can also assist practitioners in selecting appropriate metrics for evaluating fairness in a specific application.

translated by 谷歌翻译

Towards Improved Prediction of Ship Performance: A Comparative Analysis on In-service Ship Monitoring Data for Modeling the Speed-Power Relation

Simon DeKeyser , Casimir Morobé , Malte Mittendorf

分类：机器学习

2022-12-26

Accurate modeling of ship performance is crucial for the shipping industry to optimize fuel consumption and subsequently reduce emissions. However, predicting the speed-power relation in real-world conditions remains a challenge. In this study, we used in-service monitoring data from multiple vessels with different hull shapes to compare the accuracy of data-driven machine learning (ML) algorithms to traditional methods for assessing ship performance. Our analysis consists of two main parts: (1) a comparison of sea trial curves with calm-water curves fitted on operational data, and (2) a benchmark of multiple added wave resistance theories with an ML-based approach. Our results showed that a simple neural network outperformed established semi-empirical formulas following first principles. The neural network only required operational data as input, while the traditional methods required extensive ship particulars that are often unavailable. These findings suggest that data-driven algorithms may be more effective for predicting ship performance in practical applications.

translated by 谷歌翻译

Intelligent Feature Extraction, Data Fusion and Detection of Concrete Bridge Cracks: Current Development and Challenges

Di Wang , Simon X. Yang

分类：机器学习 | 人工智能

2022-12-24

As a common appearance defect of concrete bridges, cracks are important indices for bridge structure health assessment. Although there has been much research on crack identification, research on the evolution mechanism of bridge cracks is still far from practical applications. In this paper, the state-of-the-art research on intelligent theories and methodologies for intelligent feature extraction, data fusion and crack detection based on data-driven approaches is comprehensively reviewed. The research is discussed from three aspects: the feature extraction level of the multimodal parameters of bridge cracks, the description level and the diagnosis level of the bridge crack damage states. We focus on previous research concerning the quantitative characterization problems of multimodal parameters of bridge cracks and their implementation in crack identification, while highlighting some of their major drawbacks. In addition, the current challenges and potential future research directions are discussed.

translated by 谷歌翻译

A Semantic Framework for Neural-Symbolic Computing

Simon Odense , Artur d'Avila Garcez

分类：人工智能

2022-12-22

Two approaches to AI, neural networks and symbolic systems, have been proven very successful for an array of AI problems. However, neither has been able to achieve the general reasoning ability required for human-like intelligence. It has been argued that this is due to inherent weaknesses in each approach. Luckily, these weaknesses appear to be complementary, with symbolic systems being adept at the kinds of things neural networks have trouble with and vice-versa. The field of neural-symbolic AI attempts to exploit this asymmetry by combining neural networks and symbolic AI into integrated systems. Often this has been done by encoding symbolic knowledge into neural networks. Unfortunately, although many different methods for this have been proposed, there is no common definition of an encoding to compare them. We seek to rectify this problem by introducing a semantic framework for neural-symbolic AI, which is then shown to be general enough to account for a large family of neural-symbolic systems. We provide a number of examples and proofs of the application of the framework to the neural encoding of various forms of knowledge representation and neural network. These, at first sight disparate approaches, are all shown to fall within the framework's formal definition of what we call semantic encoding for neural-symbolic AI.

translated by 谷歌翻译

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

分类：机器学习

2022-12-22

Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).

translated by 谷歌翻译

What do LLMs Know about Financial Markets? A Case Study on Reddit Market Sentiment Analysis

Xiang Deng , Vasilisa Bashlovkina , Feng Han , Simon Baumgartner , Michael Bendersky

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-21

Market sentiment analysis on social media content requires knowledge of both financial markets and social media jargon, which makes it a challenging task for human raters. The resulting lack of high-quality labeled data stands in the way of conventional supervised learning methods. Instead, we approach this problem using semi-supervised learning with a large language model (LLM). Our pipeline generates weak financial sentiment labels for Reddit posts with an LLM and then uses that data to train a small model that can be served in production. We find that prompting the LLM to produce Chain-of-Thought summaries and forcing it through several reasoning paths helps generate more stable and accurate labels, while using a regression loss further improves distillation quality. With only a handful of prompts, the final model performs on par with existing supervised models. Though production applications of our model are limited by ethical considerations, the model's competitive performance points to the great potential of using LLMs for tasks that otherwise require skill-intensive annotation.

translated by 谷歌翻译

Temporal Disaggregation of the Cumulative Grass Growth

Thomas Guyet , Laurent Spillemaecker , Simon Malinowski , Anne-Isabelle Graux

分类：机器学习 | 人工智能

2022-12-21

Information on the grass growth over a year is essential for some models simulating the use of this resource to feed animals on pasture or at barn with hay or grass silage. Unfortunately, this information is rarely available. The challenge is to reconstruct grass growth from two sources of information: usual daily climate data (rainfall, radiation, etc.) and cumulative growth over the year. We have to be able to capture the effect of seasonal climatic events which are known to distort the growth curve within the year. In this paper, we formulate this challenge as a problem of disaggregating the cumulative growth into a time series. To address this problem, our method applies time series forecasting using climate information and grass growth from previous time steps. Several alternatives of the method are proposed and compared experimentally using a database generated from a grassland process-based model. The results show that our method can accurately reconstruct the time series, independently of the use of the cumulative growth information.

translated by 谷歌翻译